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Abstract Least Squares Twin Support Vector Machine 
(LSTSVM) is an extremely efficient and fast version 
of SVM algorithm for binary classification. LSTSVM 
combines the idea of Least Squares SVM and Twin 
SVM in which two non-parallel hyperplanes are found 
by solving two systems of linear equations. Although 
the algorithm is very fast and efficient in many classi¬ 
fication tasks, it is unable to cope with two features of 
real-world problems. First, in many real-world classifi¬ 
cation problems, it is almost impossible to assign data 
points to a single class. Second, data points in real- 
world problems may have different importance. In this 
study, we propose a novel version of LSTSVM based 
on fuzzy concepts to deal with these two characteris¬ 
tics of real-world data. The algorithm is called Fuzzy 
LSTSVM (FLSTSVM) which provides more flexibility 
than the binary classification of LSTSVM. Two models 
are proposed for the algorithm. In the first model, a 
fuzzy membership value is assigned to each data point 
and the hyperplanes are optimized based on these fuzzy 
samples. In the second model we construct fuzzy hyper- 
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planes to classify data. Finally, we apply our proposed 
FLSTSVM to an artificial as well as three real-world 
datasets. Results demonstrate that FLSTSVM obtains 
better performance than SVM and LSTSVM. 

Keywords Fuzzy sets. • Least squares twin support 
vector machine. • Fuzzy hyperplane. • Classification. 


1 Introduction 

Support Vector Machine (SVM) is a classification tech¬ 
nique based on the idea of Structural Risk Minimiza¬ 
tion (SRM). It is a kernel-based classifier which was 
first introduced in 1995 by Vapnik and his colleagues, 
at AT&T Bell Laboratories [1]. The algorithm has been 
used in many classification tasks due to its success in 
recognizing handwritten characters in which it outper¬ 
formed precisely trained neural networks. Some of these 
tasks are: text classification [2], image classification [3], 
and bioinformatics [4,5]. 

One of the newest versions of SVM is Least Squares 
Twin Support Vector Machine (LSTSVM) introduced 
in 2009 [6]. The algorithm combines the idea of Least 
Squares SVM (LSSVM) [7] and Twin SVM (TSVM) [8]. 
Although, in some classification tasks LSTSVM pro¬ 
vides high accuracies [6] it still suffers from two main 
drawbacks. (I) In real-world applications, the data points 
may not be entirely assigned to a class, while LSTSVM 
strictly assigns each data point to a class, (II) Although 
in many classification tasks data points have different 
importance; LSTSVM considers the data points to have 
same priorities. 

Many real-world applications require different values of 
importance for input data. In such cases, the main con¬ 
cern is how to determine the final classes by assigning 
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different importance degrees to training data. More¬ 
over, the classifier should be designed in a way that 
it can separate the noises from data. A good approach 
to cope with these challenges is to use the concept of 
fuzzy functions. 

The fuzzy theory is very useful for analyzing complex 
processes using standard quantitative methods or when 
the available information is interpreted uncertainly. A 
fuzzy function can represent uncertainty in data struc¬ 
tures using fuzzy parameters. In the literature, the con¬ 
cepts of fuzzy function and fuzzy operations are intro¬ 
duced by different researchers [9-13]. A fuzzy function 
offers an efficient way of capturing the inexact nature 
of real-world problems. 

In this paper we incorporate the concept of fuzzy set 
theory into the LSTSVM model. Unlike the standard 
LSTSVM, in the training phase the proposed fuzzy 
LSTSVM treats training data points according to their 
importance degrees. In the literature, several approaches 
of applying fuzzy sets in SVM have been proposed [14- 
18]. The key feature of the proposed fuzzy LSTSVM is 
that it assigns fuzzy membership values to data points 
based on their importance degrees. In addition, we use 
fuzzy numbers to set the parameters of the fuzzy LSTSVM 
model such as the weight vector and the bias term. 
Using these two features, we proposed two models for 
fuzzy LSTSVM. 

The rest of this paper is organized as follows. A brief 
review of basic concepts including the SVM, TSVM, 
and LSTSVM is presented in Section 2. The proposed 
models for fuzzy LSTSVM are introduced in Section 
3. In Section 4 we evaluate the proposed models, and 
finally. Section 5 concludes the paper. 


2 Basic Concepts 

In this section a quick review of different versions of 
SVM is presented, namely the standard SVM, TSVM, 
and LSTSVM. 


2.1 Support Vector Machine 

The main idea behind SVM is to minimize the classi¬ 
fication error while preserving the maximum possible 
margin between classes. Suppose we are given a set of 
training data points Xi G R^, i = 1, • • • , n with labels 
yi G { —1,+1}. SVM seeks for a hyperplane with equa¬ 
tion re.x + 6 = 0 with the following constraints: 


( 1 ) 


where w is the weight vector. Such a hyperplane could 
be obtained by solving Eq. (2): 


lire Ip 

Minimize f{x) = —-— 
subject to yi{w.Xi -h 6) — 1 > 0 


( 2 ) 


The geometric interpretation of this formulation is de¬ 
picted in Fig. 1 for a toy example. 



Fig. 1: Geometric interpretation of SVM 


2.2 Twin Support Vector Machine 

In SVM only one hyperplane does the task of partition¬ 
ing the samples into two groups of positive and negative 
classes. For the first time in 2007, Jayadeva et al. [8] pro¬ 
posed TSVM with the idea of using two hyperplanes in 
which samples are assigned to a class according to their 
distance from the hyperplanes. The main equations of 
TSVM are as follows: 

XiW^^^ + = 0 (3) 

XiW^‘^^ -h b^‘^^ = 0 

where and are the weight vector and bias term 
of the i^^ hyperplane, respectively. Each hyperplane 
represents the samples of its class. This concept is ge¬ 
ometrically depicted in Fig. 2 for a toy example. In 
TSVM, the two hyperplanes are non-parallel. Each of 
them is closest to the samples of its own class and far¬ 
thest from the samples of the opposite class [19,20]. 
Let us assume that A and B indicate two data points 
of class +1 and class —1, respectively. The two hyper¬ 
planes are obtained by solving Eq. (4) and Eq. (5). 

Minimize 
w.r.t 

subject to — + 626 ^)) -h ^ > 62 , C ^ 0 


yi{w.Xi + 6 ) > 1 , 


Vi. 


( 4 ) 
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Fig. 2: The Geometric interpretations of Twin SVM 


5 ( 1 ) 


-{F^F + —E^E)-^F^e 
Pi 


(8) 



{E^E+ —F^Fy'^E'^e 
P2 


( 9 ) 


where E = [A e] and F = [Be] and e and ^ are 

already introduced in Section 2.2. 


Minimize + 626^^^) +P2ef^ 

w.r.t 

subject to ^ ^ > 0 (^5^ 

In these equations ^ represents the slack variables, {i G 
{1,2}) is a column vector of ones with desirable length, 
and Pi and p 2 are penalty parameters. 

2.3 Least Squares Twin Support Vector Machine 

LSTSVM [6], [21] is a binary classifier which combines 
the idea of LSSVM and TSVM. In other words, LSTSVM 
employs least squares of errors to modify the inequality 
constraints in TSVM to equality constraints and solves 
a set of linear equations rather than two Quadratic Pro¬ 
gramming Problems (QPPs). Experiments have shown 
that LSTSVM can considerably reduce the training time, 
while preserving competitive classification accuracy [7, 
22]. Furthermore, since the time complexity of SVM is 
of order m^, where m is the number of constraints, the¬ 
oretically when there are an equal number of positive 
and negative samples, the speed of the algorithm in¬ 
creases by the factor of four compared to the standard 
SVM. 

LSTSVM finds its hyperplanes by minimizing Eq. (6) 
and Eq. (7) which are linearly solvable. By solving Eq. 
(6) and Eq. (7), the values of w and b for each hyper¬ 
plane are obtained according to Eq. (8) and Eq. (9). 

Minimize + eb^^^Y' [Aw^^^ + eb^^^) + ^ 

w.r.t 

subject to — + ^ = e 

Minimize + eb^‘^^Y + eb^‘^^) + ^ 

w.r.t 

subject to {Aw^‘^^ + eb^‘^^) ^ = e 


3 Fuzzy Least Squares Twin Support Vector 
Machine 

In this section, first we explain the importance of fuzzy 
classification and then we introduce two approaches for 
improving LSTSVM using the fuzzy sets theory. Basic 
notations used in this section are as follows: samples 
of the positive and negative classes are represented by 
matrices A and 5, respectively. A contains mi positive 
samples and B contains m 2 negative samples. Member¬ 
ship degrees are represented by p and slack variables are 
represented by vector All equations will be presented 
in matrix form where for each matrix M, its transpose 
is represented by M^. e is a vector with arbitrary size 
and all its elements are equal to 1. 

3.1 Fuzzy Classification 

In many real-world applications a sample in the training 
data does not belong exactly to a single class. Further¬ 
more, in some applications it would be desirable for the 
new training samples to have higher importance than 
older ones. Given the uncertainty of assigning such im¬ 
portance values, the fuzzy sets provide an elegant way 
to cope with this problem. We can define a fuzzy mem¬ 
bership degree pi for each sample in the training data. 
A membership degree is a number between 0 and 1 
which can be considered as a measure of influence of the 
sample on the final class. Therefore, a training sample 
with membership degree of pi influences class +1 by 
Pi and influences class —1 by {1 — pi). In addition, us¬ 
ing fuzzy membership functions, it is possible to assign 
a membership degree to each sample based on its en¬ 
try time. Sequential learning [23] is another application 
which induces applying fuzzy concepts in classification 
algorithms such as SVM. 

In 2008 Pei-Yi Hao introduced fuzzy SVM [18]. In his 
paper, he introduced two approaches. Mi and M 2 for 
applying fuzzy sets in SVM. In the first model. Mi, he 
constructed a crisp hyperplane, and he also assigned 
a fuzzy membership to each data point. In the sec¬ 
ond model, M 2 , he constructed a fuzzy hyperplane to 
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discriminate classes. In the following sections, we inte¬ 
grated the fuzzy set theory into the LSTSVM algorithm 
in accordance with [18]. 


By solving the above equations using some matrix alge¬ 
bra, we would have Eq. (14) and Eq. (15) as the equa¬ 
tions of the hyperplanes Ji and J 2 , respectively. 


3.2 Euzzy LSTSVM: Model Mi 



pB'^e 

^(1) 

1 

'A'^A 


yjW 

lie^B 

tim2 _ 

6 ( 1 ) _ 

H- 

Pi 

A 

mi 

6 ( 1 ) _ 


In this model, fuzzy memberships values are assigned 
to data points such that noises and outliers get small 
membership values. Our goal is to construct two crisp 
hyperplanes to distinguish target classes. In order to 
use this model in LSTSVM algorithm, we rewrote Eq. 
(6) and Eq. (7) in the form of Eq. (10) and Eq. (11): 


'nA'^A 

liA^e 

w(A 

1 

B'^B B'^e 


A 

pimi 

_6(2)_ 

H- 

P 2 

B m 2 

_6(2)_ 


These two equations can be represented as Eq. (16) and 
Eq. (17), respectively. 


Minimize Ji = 
w.r.t 

subject to — + ^ = Oe 


( 10 ) 


Minimize J2 = -h eb^‘^^)'^-\- eb^‘^^) -\- 

w.r.t 

subjectto — + C = Oe (H) 

Eq. (10) and Eq. (11) represent equations of the posi¬ 
tive and the negative class hyperplanes, respectively. In 
these two equations the membership degree ji appears 
only as error coefficient. 


By obtaining ^ and substituting it in Eq. (10) and Eq. 
(11), the two equations are reformulated as Eq. (12) 
and Eq. (13). 

Minimize Ji = -\\Aw^^"^ + + '^/jl\\Bw^^"^ + eb^^"^ + e|p 

w.r.t (12) 


jy(l) 


_6(i)_ 



p 

_6(2)_ 



Ij.B'^B + iJ.B'^e -h 

Pi Pi 

lie^B + jie^A iim2 + ^rnie 


jjiA^A+j-^B^B ^jiA^e+j-^B^e 

fie^A + ^e^B iimi + ^^26 


-1 

■-fii’e' 


. “’”2 _ 


-1 

—A^e 


—mi 


(16) 


(17) 


Once the values of and are obtained, 

a new data point is assigned to a class based on its dis¬ 
tance from the hyperplane of the corresponding class. 


3.3 Euzzy LSTSVM: Model M 2 

In this model, we construct fuzzy hyperplanes to dis¬ 
criminate the classes. In M 2 , all parameters of the model, 
even the components of weight vector re, are fuzzy num¬ 
bers. Eor computational simplicity all parameters used 
in this work are restricted to a class of ’’triangular” 
symmetric membership functions. Eor a symmetric tri¬ 
angular fuzzy number V = (o,r), o is the center and r 
is the width of the corresponding membership function. 


Minimize J2 = ^\\Bw^‘^^ + eb^‘^^\\^ + '^ii\\Aw^‘^^ + eb^"^^ + e|p 
w.r.t 


(13) 


By differentiating Eq. (12) and Eq. (13) with respect to 
w and 5, we have: 


Let us assume W and B are the fuzzy weight vector and 
fuzzy bias term, respectively, where each component of 
W is shown by Wi = (wi^Ci) and B = (5, d). Then the 
equation of a fuzzy hyperplane is defined as follows: 

W.X + B =< Wi,Ci > .Xi -\ -h < Wn, Cn > -XnA < b,d >= 0 ( ^3) 


7-—V = A'^{Aw^^^ + eb^^^) A pifiB^{Bw^^^ + eb^^^ + e) = Oe 
owA) 

= e'^(Aw^^^ -|- eb^^^) A Pipe'^(Bw^^^ + eb^^^ + e) = 0 
= B^(Bw^^^ + eb^‘^^) A P2ti '^'^+ eb^‘^^ + e) = Oe 

owA) 

= e^{Bw^‘^^ + eb^‘^^) A A e5*^^^ + e) = 0 

obA) 


To find the fuzzy hyperplane for class +1 of our fuzzy 
LSTSVM, we rewrite Eq. (6) as: 

Minimize J = ^{Aw^^^ A eb^^^)'^{Aw^^^ A eb^^^) A 

M(i||cW|p + dW) 

w.r.t 

subjectto {{Bw^^^) A eb^^^) = e — ^ (19) 

In this equation, ^ |p + measures the vagueness 
of the model. As the vagueness of the model increases. 
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the results would be more inexact. In Eq. (19) the pa¬ 
rameter M is a control parameter chosen by the user. 
Also ^ determines the amount of least squares er¬ 
ror, where /i is the membership degree of the positive 
sample and the vector ^ is the slack variable vector. pi 
is a trade-off parameter which controls the effect of the 
least squares error on the hyperplane. 

Eq. (19) can be rewritten as Eq. (20) 

Minimize J = ^\\{Aw^^^) + + {Ac^^^) + 

+ e 6 W + e|| + M(i||c(‘)f + d™) 

w.r.t ( 20 ) 


Setting the derivation of Eq. (20) with respect to 

and equal to zero and solving the above 
equations, the below system would appear: 


'wM 


yi) 


c(i) 


(i(i) 



+ j-A^e + ^B^e ^A^A 

j^e'^A +ije'^B +/xma 

A'^A A^e A'^A + eMe'^ 

e^A mi 


mi 


■ 

jiB'^e 




0 


M 


( 21 ) 


Up to now we have found all the necessary parameters 
of the first fuzzy hyperplane. By substituting values of 
these parameters in Eq. (18), we can obtain the equa¬ 
tion of the first fuzzy hyperplane. 

Eor the second hyperplane, the equations are as follows: 

Minimize J = + eb^‘^^) + 

+ + dA 

W.r.t 

subject to {{Aw^^^) + f = 0 

( 22 ) 


That can be rewritten as: 


Minimize J = ^\\{Bw^‘^^) + + {Bc^‘^^) + eS‘^^\\‘^A 

P2iJ.\\{AwA + eh^^'’+4 +M{^\A^f+dA 

w.r.t 


(23) 


Hence we have: 




■,dA^A+XB^B ^A^e+XB^e ^B^B ^B^e 


fjtA^e 

6(2) 


+ Xe^B Xm2 



C( 2 ) 


B'^B B^e eMe^ + B^e 


0 

(i(2) 


e^B m 2 e^B m 2 


M _ 


(24) 


By solving Eq. (24) and finding the values of the param¬ 
eters w^‘^\ h^‘^\ and the equation of the second 
fuzzy hyperplane can be obtained using Eq. (18). Af¬ 
ter finding the equations of the two fuzzy hyperplanes, 
the fuzzy distance between a given test data point and 
the fuzzy hyperplanes should be calculated. Definition 


1 shows how find the fuzzy distance between a data 
point and a fuzzy hyperplane. 

Definition 1: A = (S,j) is the fuzzy distance between 
a data point Xq = (tcoi, • • • ,^0n) and the fuzzy hyper¬ 
plane W.x -h B, where: 

r_ \w\Xoi-\ - \-WnXQnAb\ 

IIIFII 

and 

__|(wi+Ci)aioiH- V{Wn+Cn)xQri\ 

7 - lirvil 

By finding fuzzy distances between the data point and 
the fuzzy hyperplanes, it is necessary to define a fuzzy 
membership function which determines the member¬ 
ship degree of the data point in each fuzzy hyperplane. 
Let us assume that Z\i = (^1,71) and A 2 = (^2,72) are 
fuzzy distances between a data point and the two hy¬ 
perplanes Hi and H 2 , respectively. Then for an input 
data xo, the degree that xq belongs to hyperplane Hi is 
defined by the following membership function (by find¬ 
ing membership degrees for Hi^ membership degrees for 
H 2 are easily obtainable): 


/^i(^o) = < 


1 

1 

1 

1 


<^i+7i 

( 5 i+ 7 i+< 52+72 

<5i 

< 5 i+< 52+72 

<^i+7i 

( 5 i+ 7 i +<52 

<5i 

S 1 AS 2 


<^1 > 7l 5 <^2 > 72 5 
•Ji < 7 i72 > 72, 
7 > 71, <52 < 72 

7 < 7i,<^2 < 72, 


(25) 


4 Numerical Experiments 

To evaluate the performance of our proposed algorithm, 
we investigate its classification accuracy on both artifi¬ 
cial and benchmark datasets. All experiments are car¬ 
ried out in Matlab 7.9 (R2009b) environment on a PC 
with Intel processor (2.9 GHz) with 2 GB RAM. The 
Accuracy used to evaluate a classifier is defined as: 
(TP + TN)/{TP -f PP + TA + PA), where TP, TN, 
FP and FN are the number of true positive, true nega¬ 
tive, false positive and false negative, respectively. Also 
the accuracies are measured by the standard 10-fold 
cross-validation methodology [24]. In our implementa¬ 
tion, we focus on comparison between SVM, LSTSVM 
and ELSTSVM with model M2. 


4.1 Experiment on artificial dataset 

We first consider a two-dimensional “Xor” dataset, which 
is a very common dataset for evaluating the effective¬ 
ness of SVM-based algorithms, shown in Eig. 3. This 
hand-made dataset consists of 121 records belonging to 
two classes. Each record has two features: a class and a 
value which determines how much the record belongs to 
the class. The stars denote the data points of positive 
class, while the triangular belong to the negative class. 
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-si -.-^ 

-5 0 5 

Fig. 3: The synthetic dataset 


Table 1 shows the results of applying SVM, LSTSVM 
and FLSTSVM algorithms to the dataset. It should be 
noted that in this paper, only the linear version of all 
the three algorithms are studied. The obtained values 


Table 1: A comparison of classification accuracy 


Algorithms 

Accuracy(%) 

SVM 

53.0 

LSTSVM 

65.0 

FLSTSVM 

73.0 


for accuracies of all three algorithms are fully justifi¬ 
able. In SVM there is only one hyperplane responsible 
of classifying data and because the dataset is defined 
as a two-dimensional space, this hyperplane would be a 
line which is shown in figure 4a. In LSTSVM algorithm 
we have two lines for classification. As mentioned in Sec¬ 
tion 2.3, these lines should be the nearest to their cor¬ 
responding class records and farthest from the opposite 
class records. Figure 4b shows these lines for LSTSVM. 
Because the data points overlap and they don’t exactly 
he on a line, the LSTSVM algorithm has still a large 
amount of error although it has higher accuracy com¬ 
pared to the SVM. 

FLSTSVM also has two lines responsible for classifying 
data with the difference that these two lines are not 
crisp. Figure 4c shows the fuzzy lines of FLSTSVM. To 
show the fuzzy nature of each line, we have used multi¬ 
ple lines. As shown in the figure, these fuzzy lines dis¬ 
criminate the data points better than SVM and LSTSVM, 
Therefore FLSTSVM provides higher accuracy com¬ 
pared to the other two algorithms. 


4.2 Experiments on benchmark datasets 

We also performed experiments on a collection of four 
benchmark datasets form UCI machine learning repos¬ 
itory [25]. These datasets are Heart-Statlog, Australian 




(b) 



(c) 

Fig. 4: Decision lines obtained by (a). SVM, (b). LSTSVM 
and (c). FLSTSVM 


Credit Approval, Liver Disorder and Breast Cancer Wis¬ 
consin. These datasets represent a wide range of size 
(from 198 to 690) and features (from 7 to 34). Details 
of the four datasets are listed in Table 2. Also, Table 
3 lists the results of each algorithm. As shown in the 
table, FLSTSVM has higher accuracies compared to 
the other two algorithms. It should be noted once more 
that, in these experiments only the linear version of the 
FLSTSVM is considered (and so for the other two al¬ 
gorithms). We claim that the non-linear version of the 
proposed algorithm would outperform the non-linear 
version of SVM and LSTSVM with more meaningful 
differences. 
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Table 2: Datasets 


Datasets 

# Features 

# Samples 

Lost Data? 

Heart-Statlog 

13 

270 

No 

Australian Credit Approval 

14 

690 

No 

Liver Disorders 

7 

345 

No 

Breast Cancer Wisconsin (Prognostic) 

34 

198 

No 


Table 3: Experimental results of SVM, LSTSVM, and 
FLSTSVM 


Dataset 

Accuracy(%) 

SVM 

LSTSVM 

FLSTSVM 

Heart-Statlog 

84.1 

85.5 

87.9 

Austrailian Credit 
Approval 

85.5 

86.6 

90.7 

Liver Disorder 

58.3 

70.9 

73.2 

Breast Cancer 

79.9 

83.8 

98.2 


5 Conclusion 

In this paper, we enriched LSTSVM classifier by in¬ 
corporating the theory of fuzzy sets. We proposed two 
novel models for fuzzy LSTSVM. In the first model, Mi, 
a fuzzy membership was assigned to each input point 
and the hyperplanes were optimized based on fuzzy im¬ 
portance degrees of samples. In the second model, M 2 , 
all parameters to be identified in LSTSVM are consid¬ 
ered to be fuzzy. Also to discriminate the target class 
in M 2 , we construct two fuzzy hyperplanes. We car¬ 
ried out a series of experiments to analyze our classifier 
against SVM and LSTSVM. The results demonstrate 
that FLSTSVM obtains better accuracies that the other 
two algorithms.As our future work, we plan to focus on 
non-linear version of the fuzzy LSTSVM algorithm. 
Our method can be employed in the development of 
intelligent systems for decision making, pattern recog¬ 
nition, optimization, and control. It permits for the in¬ 
clusion of vague assessments in classification problems. 
FLSTSVM can have several various applications in real 
life. Disease detection, image analysis, image denois- 
ing, weather forecasting systems, intrusion detection 
are some instances of its application. In addition to bi¬ 
nary classification, it can applied on multi-class classi¬ 
fication problems. FLSTSVM also can be employed in 
regression problem. 
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